Search CORE

30 research outputs found

A Survey on Recognizing Textual Entailment as an NLP Evaluation

Author: Poliak Adam
Publication venue
Publication date: 01/01/2020
Field of study

Recognizing Textual Entailment (RTE) was proposed as a unified evaluation framework to compare semantic understanding of different NLP systems. In this survey paper, we provide an overview of different approaches for evaluating and understanding the reasoning capabilities of NLP systems. We then focus our discussion on RTE by highlighting prominent RTE datasets as well as advances in RTE dataset that focus on specific linguistic phenomena that can be used to evaluate NLP systems on a fine-grained level. We conclude by arguing that when evaluating NLP systems, the community should utilize newly introduced RTE datasets that focus on specific linguistic phenomena.Comment: 1st Workshop on Evaluation and Comparison for NLP systems (Eval4NLP) at EMNLP 2020; 18 page

arXiv.org e-Print Archive

Crossref

Scholarship, Research, and Creative Work at Bryn Mawr College | Bryn Mawr College Research

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

Author: Belinkov Yonatan
Glass James
Poliak Adam
Van Durme Benjamin
Publication venue
Publication date: 01/01/2018
Field of study

We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena. We use these representations as features to train a natural language inference (NLI) classifier based on datasets recast from existing semantic annotations. In applying this process to a representative NMT system, we find its encoder appears most suited to supporting inferences at the syntax-semantics interface, as compared to anaphora resolution requiring world-knowledge. We conclude with a discussion on the merits and potential deficiencies of the existing process, and how it may be improved and extended as a broader framework for evaluating semantic coverage.Comment: To be presented at NAACL 2018 - 11 page

arXiv.org e-Print Archive

Crossref

Hypothesis Only Baselines in Natural Language Inference

Author: Haldar Aparajita
Naradowsky Jason
Poliak Adam
Rudinger Rachel
Van Durme Benjamin
Publication venue
Publication date: 01/01/2018
Field of study

We propose a hypothesis only baseline for diagnosing Natural Language Inference (NLI). Especially when an NLI dataset assumes inference is occurring based purely on the relationship between a context and a hypothesis, it follows that assessing entailment relations while ignoring the provided context is a degenerate solution. Yet, through experiments on ten distinct NLI datasets, we find that this approach, which we refer to as a hypothesis-only model, is able to significantly outperform a majority class baseline across a number of NLI datasets. Our analysis suggests that statistical irregularities may allow a model to perform NLI in some datasets beyond what should be achievable without access to the context.Comment: Accepted at *SEM 2018 as long paper. 12 page

arXiv.org e-Print Archive

Crossref

Scholarship, Research, and Creative Work at Bryn Mawr College | Bryn Mawr College Research

REVISITING RECOGNIZING TEXTUAL ENTAILMENT FOR EVALUATING NATURAL LANGUAGE PROCESSING SYSTEMS

Author: Poliak Adam
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 16/02/2021
Field of study

Recognizing Textual Entailment (RTE) began as a unified framework to evaluate the reasoning capabilities of Natural Language Processing (NLP) models. In recent years, RTE has evolved in the NLP community into a task that researchers focus on developing models for. This thesis revisits the tradition of RTE as an evaluation framework for NLP models, especially in the era of deep learning. Chapter 2 provides an overview of different approaches to evaluating NLP sys- tems, discusses prior RTE datasets, and argues why many of them do not serve as satisfactory tests to evaluate the reasoning capabilities of NLP systems. Chapter 3 presents a new large-scale diverse collection of RTE datasets (DNC) that tests how well NLP systems capture a range of semantic phenomena that are integral to un- derstanding human language. Chapter 4 demonstrates how the DNC can be used to evaluate reasoning capabilities of NLP models. Chapter 5 discusses the limits of RTE as an evaluation framework by illuminating how existing datasets contain biases that may enable crude modeling approaches to perform surprisingly well. The remaining aspects of the thesis focus on issues raised in Chapter 5. Chapter 6 addresses issues in prior RTE datasets focused on paraphrasing and presents a high-quality test set that can be used to analyze how robust RTE systems are to paraphrases. Chapter 7 demonstrates how modeling approaches on biases, e.g. adversarial learning, can enable RTE models overcome biases discussed in Chapter 5. Chapter 8 applies these methods to the task of discovering emergency needs during disaster events

JScholarship

Quantum-chemical study of C-H bond dissociation enthalpies of various small non-aromatic organic molecules

Author: Adam Vagánek
Peter Poliak
Publication venue
Publication date: 01/01/2013
Field of study

Abstract: In this work, C-H bond dissociation enthalpies (BDE) and vertical ionization potentials (IP) for various hydrocarbons and ketones were calculated using four density functional approaches. Calculated BDEs and IPs were correlated with experimental data. The linearity of the corresponding dependences can be considered very good. Comparing two used functionals, B3LYP C-H BDE values are closer to experimental results than PBE0 values for both used basis sets. The 6-31G* basis set employed with both functionals, gives the C-H BDEs closer to the experimental values than the 6-311++G** basis set. Using the obtained linear dependences BDE exp = f (BDE calc ), the experimental values of C-H BDEs for some structurally related compounds can be estimated solely from calculations. As a descriptor of the C-H BDE, the IPs and 13 C NMR chemical shifts have been investigated using data obtained from the B3LYP/6-31G* calculations. There is a slight indication of linear correlation between IPs and C-H BDEs in the sets of simple alkanes and alkenes/ cycloalkenes. However, for cycloalkanes and aliphatic carbonyl compounds, no linear correlation was found. In the case of the 13 C NMR chemical shifts, the correlation with C-H BDEs can be found for the sets of alkanes and cycloalkanes, but for the other studied molecules, no trends were detected

CiteSeerX

Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation

Author: Haldar Aparajita
Hu J. Edward
Pavlick Ellie
Poliak Adam
Rudinger Rachel
Van Durme Benjamin
White Aaron Steven
Publication venue
Publication date: 01/01/2018
Field of study

We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources.Comment: To be presented at EMNLP 2018. 15 page

arXiv.org e-Print Archive

Crossref

Scholarship, Research, and Creative Work at Bryn Mawr College | Bryn Mawr College Research

Evaluating Paraphrastic Robustness in Textual Entailment Models

Author: Lal Yash Kumar
Poliak Adam
Sinha Shreyashee
Van Durme Benjamin
Verma Dhruv
Publication venue
Publication date: 29/06/2023
Field of study

We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluation set to determine if RTE models' predictions change when examples are paraphrased. In our experiments, contemporary models change their predictions on 8-16\% of paraphrased examples, indicating that there is still room for improvement

arXiv.org e-Print Archive